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Abstract — We propose a Capabilities-based approach for 
building long-lived, complex systems that have lengthy devel- 
opment cycles. User needs and technology evolve during these 
extended development periods, and thereby, inhibit a fixed 
requirements-oriented solution specification. In effect, for com- 
plex emergent systems, the traditional approach of baselining 
requirements results in an unsatisfactory system. Therefore, we 
present an alternative approach, Capabilities Engineering, which 
mathematically exploits the structural semantics of the Function 
Decomposition graph — a representation of user needs — to 
formulate Capabilities. For any given software system, the set 
of derived Capabilities embodies change-tolerant characteristics. 
More specifically, each individual Capability is a functional 
abstraction constructed to be highly cohesive and to be minimally 
coupled with its neighbors. Moreover, the Capability set is chosen 
to accommodate an incremental development approach, and to 
reflect the constraints of technology feasibility and implementa- 
tion schedules. We discuss our validation activities to empirically 
prove that the Capabilities-based approach results in change- 
tolerant systems. 

I. Introduction 

In the more recent times there has been an increase in the 
development of large-scale complex systems. Hybrid commu- 
nication systems, state-of-art defense systems, technologically 
advanced aeronautics systems and similar engineering projects 
demand huge investments of time and money. Unfortunately, 
a number of such systems fail or are prematurely abandoned 
despite the availability of necessary resources [1] [2] [3]. A 
major reason for this is that they have lengthy development 
periods during which various factors of change detrimentally 
influence the system. The effect of factors such as changing 
user needs, varying technology constraints and unpredictable 
market demands is further exacerbated by the inherent com- 
plexity of these systems. The primary manifestation of change, 
irrespective of its cause, is in the form of requirements 
volatility; requirements are specifications that dictate system 
development. It is recognized that the impact of requirements 
volatility has far-reaching effects like increasing the defect 
density during the coding and testing phase [4] and affecting 
the overall project performance [5]. 

Traditional Requirements Engineering (RE) attempts to 
minimize change by baselining requirements. This approach is 
successful in small-scale systems whose relative simplicity of 
system functionality and brief development cycles discourages 
changing user perceptions. Furthermore, the inability to foster 



new technology in a short time-period assures the realization 
of systems using initial technology specifications. In contrast, 
traditional RE is ill-equipped to scale the monumental com- 
plexity of large-scale systems and accommodate the dynamics 
of extended development periods. Hence, there is a need for 
an alternative approach that transcends the complexity and 
scalability limits of current RE methods. 

We propose a Capabilities-based approach for building 
large-scale complex systems that have lengthy development 
cycles. Capabilities are change-tolerant functional abstrac- 
tions that are foundational to the composition of system 
functionality. User needs are the primary sources of infor- 
mation about desired system functions. We use a Function 
Decomposition (FD) graph to represent needs, and thereby, 
understand desired system functionalities and their associated 
levels of abstraction. The Capabilities Engineering (CE) 
process mathematically exploits the structural semantics of 
the FD graph to formulate Capabilities as functional abstrac- 
tions with high cohesion and low coupling. The process also 
employs a multi-disciplinary optimization (MDO) approach 
to select optimal sets of Capabilities that accommodate an 
incremental development approach, and reflect the constraints 
of technology feasibility and implementation schedules. Note 
that Capabilities are architected to accommodate specific fac- 
tors of change, viz. requirements volatility and technology 
advancement. We conjecture that the impact of requirements 
volatility is less likely to propagate beyond the affected 
Capability because of its reduced coupling with neighboring 
Capabilities. Additionally, the property of high cohesion helps 
localize the impact of change to within a Capability. The other 
factor of change, technology advancement, is accounted for by 
the conscious assessment of technology feasibility as a part of 
the MDO approach. Therefore, Capabilities are intentionally 
constructed to possess characteristics that accommodate the 
major factors of change. In fact, we envision CE as a possible 
solution to the research challenge of evolving Ultra-Large- 
Scale systems [6]. 

The remainder of the paper is organized as follows: Section 
II discusses characteristics of both large-scale and conventional 
systems, examines general strategies for managing change, 
and provides a review of related work. Section III outlines 
the overall process of CE, and details the formulation and 
optimization of Capabilities. Also, metrics to measure cou- 



pling and cohesion of Capabilities are defined. Section IV 
outlines the validation activities and discusses preliminary 
observations. Our conclusions are given in Section V. 

II. Change Management Strategies 

We define complex emergent systems as systems that are 
large-scale, complex, have lengthy development cycles and 
have a lifetime of several decades. A system is said to be 
complex when it consists of a large number of parts that 
interact in a non-trivial manner [7]. The colossal magnitude 
of a large-scale complex system impedes a priori knowledge 
about the effects of these interactions. As a result, the be- 
havioral characteristics of the overall system is greater than 
a mere aggregation of its constituent elements. This behavior 
includes properties that emerge from the elemental interactions 
and are characteristic only of the global system. Specifically, 
it is fallacious to attribute these emergent properties to indi- 
vidual elements of the system [8]. Unlike complex emergent 
systems, conventional systems are smaller-scale, less complex, 
have brief development cycles and have a shorter lifetime. 
Consequently, requirements can be baselined after a certain 
point in the development period. However, requirements and 
technology often evolve during the extended development 
periods of complex emergent systems, and thereby, inhibit a 
comprehensive up-front solution specification. Thus, a primary 
difference between developing conventional software systems 
and complex emergent systems is the lack of a final solution 
specification in the latter case caused by continuous system 
evolution. 

The first law of software evolution [9], asserts that if a 
system is to be satisfactory it has to constantly adapt to 
change. One can pursue either of the two following strategies 
to reconcile with change: 

A. Minimize Change 

Traditional RE attempts to minimize change by baselining 
requirements prior to design and development. This mandates 
that needs, the originating source of requirements, be accurate 
and complete. Different elicitation techniques such as inter- 
views, questionnaires, focus groups, and, introspection are em- 
ployed to derive needs effectively [10]. Also, numerous models 
strive to combat requirements volatility through iterative pro- 
cesses and incremental development [11] [12] [13]. Some de- 
velopment paradigms, like Extreme Programming [14], adopt 
an unconventional approach of eliminating formal RE from 
their process. Agile Modeling proposes lightweight RE [15]. 
Nevertheless, neither has been proven to work, repeatedly, for 
large and complex projects [16]. When building large systems, 
empirical research evidence indicates the failure of traditional 
RE to cope with the attendant requirements evolution [17] 
[18]. Consequently, in the case of complex emergent systems, 
which are often mission-critical, such failures are extremely 
expensive in terms of cost, time and human life. 

B. Accommodate Change 

Performance based specifications [19] [20] were introduced 
with the objective of accommodating instead of minimizing 



change. These specifications are statements of requirements 
described as outcomes desired of a system from a high level 
perspective. As a result, the solution is constrained to a much 
lesser degree and provides greater latitude in incorporating 
suitable design techniques and technology. More recently, 
Capability Based Acquisition (CBA) [21] [22] is being used to 
resolve problems posed by lengthy development periods and 
increased system complexity. It is expected to accommodate 
change and produce systems with relevant capability and 
current technology by delaying requirement specifications in 
the software development cycle, and maturing a promising 
technology before it becomes a part of the program. 

However, neither Performance based specification nor the 
CBA approach defines the level of abstraction at which a spec- 
ification or Capability is to be described. Furthermore, they 
neglect to outline any scientific procedure for deriving these 
types of specifications from the initial set of user needs. There- 
fore, these approaches propose solutions that are not definitive, 
comprehensive or mature enough to accommodate change 
and benefit the development process of complex emergent 
systems. Nevertheless, they do provide certain key concepts 
— reduced emphasis on detailed requirements specification, 
and nurturing a promising technology before it becomes a part 
of the program — that are incorporated in CE as a part of its 
strategy to accommodate change. 

Similar to Performance based specifications and CBA, 
the CE process utilizes a specification-centric approach to 
accommodate change. It enumerates Capabilities desired of 
the system at various levels of abstraction. Capabilities are 
identified after needs analysis but prior to requirements, and 
indicate the functionality desired of a system at various levels 
of abstraction. This approach complements the research by 
Hevner et al. [23], which focuses on automatically determining 
the functionality of complex information systems from re- 
quirement specifications, design or program implementations. 
Capabilities are formulated to embody high cohesion and 
minimal coupling, and are subjected to an MDO approach 
to induce desirable design characteristic. Embedding desirable 
design traits in a specification, introduces aspects of a process- 
centric approach. Hence, we theorize that CE is a hybrid ap- 
proach of both the process and specification-centric approach 
to accommodating change. 

III. Capabilities Engineering Process 

The CE process architects Capabilities as highly cohesive, 
minimally coupled functional abstractions that accommodate 
the constraints of technology feasibility and implementation 
schedule. Figure Q] illustrates the two major phases of this 
process. Phase I establishes initial sets of Capabilities based 
on their values of cohesion and coupling. These measures are 
mathematically computed from an FD graph, which represents 
the user needs and directives. Directives are system character- 
istics resolved from user needs and assist in the formulation of 
Capabilities. Hence, the two major activities of this phase are 
resolving directives from needs and formulating Capabilities 
using the FD graph. 
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Fig. 1, Capabilities Engineering Process 



Phase II, a part of our current ongoing research, employs 
an MDO approach on the initial sets of Capabilities to de- 
termine sets that are optimal with respect to the constraints 
of technology and schedule. These optimal Capabilities are 
then mapped to requirements as dictated by an incremental 
development process. Thus, the final set of Capabilities and 
their associated requirements constitutes the output of the 
CE process. Therefore, the major activities of Phase II are 
the optimization of initial Capabilities and the mapping of 
optimized Capabilities to requirements. 

The following sections discuss these phases and their activ- 
ities in detail. 

A. Phase I: Resolution 

Resolution is the process of deriving directives from needs 
using the FD graph. First, we explain the concept of directives 
and the purpose of introducing them. Then, we define the 
elements of an FD graph and enumerate the rules for its 
construction. In the process, the activity of resolution is 
described. 

1) Directives: Needs are elicited using various techniques 
[10] from different user classes to get a complete perspective 
of the system to be built. However, these needs may be 
vague, inconsistent and conflicting. Therefore, we introduce 
the concept of directives to refine and resolve needs, and 
express system characteristics in a more consistent format. We 
define a directive as a characteristic formulated in the problem 
domain language, but described from the system's perspective. 
A directive has three main purposes: 

• Captures domain information: A directive can be in- 
complete, unverifiable, and untestable. However, it serves 
the purpose of describing system functionality in the 
language of the problem domain, which aids in capturing 
domain information. In contrast, a requirement that is 
a statement formulated in the technical language of 
the solution neglects to preserve and convey valuable 
domain information. In fact, Zave and Jackson [24] have 
identified the lack of appropriate domain knowledge in 
the process of requirements refinement as a key problem 
area in RE. Therefore, the introduction of directives 
provides momentum in bridging the gap between needs 
and requirements. 



• Facilitates formulation of Capabilities: Initial sets of 
Capabilities are functional abstractions that have high 
cohesion and low coupling. In order to formulate these 
initial Capabilities we need to examine all possible func- 
tional abstractions of a system. Although, directives are 
characteristics in the problem domain, they are implicitly 
associated with some functionality desired of the actual 
system. Hence, each functional abstraction is linked with 
a set of directives. In other words, a Capability is associ- 
ated with a specific set of directives. Therefore, directives 
can be used to determine the cohesion and coupling 
values of potential functional abstractions, and thus assist 
in the formulation of Capabilities. 

> Maps to requirements: A directive is affiliated with the 
problem domain and a requirement with the solution 
domain; yet both share the same objective of describing 
the characteristics expected of the desired system. In ad- 
dition, they are described at a similar level of abstraction. 
Hence, we conjecture that the mapping of a directive 
to a requirement is straightforward. As Capabilities are 
already associated with a set of directives, this mapping 
process produces requirements that form the output of the 
CE process. Thus, directives assist in the final activity 
(see Figure [TJ of mapping Capabilities to requirements. 

2) Function Decomposition Graph: Needs are the basis for 
understanding the functionality desired of a system. Often, 
needs are expressed by users at varying levels of abstraction. 
An abstraction presents information essential to a particular 
purpose, ignoring irrelevant details. In particular, a functional 
abstraction indicates the functionality expected of the system 
from a high-level perspective while ignoring minute details. 
We use an FD graph to represent functional abstractions of the 
system, which are obtained by the systematic decomposition 
of user needs (see Figure |2j. A need at the highest level 
of abstraction is the mission of the system. This can be 
decomposed into other needs. We say that a decomposition of 
needs is equivalent to a decomposition of functions because a 
need essentially represents some functionality of the system. 
Hence, decomposition is an intuitive process of recursively 
partitioning a problem until an atomic level (here a directive) 
is reached. Specifically, the FD graph illustrates user needs in 
terms of desired functionalities and captures their associated 
levels of abstraction. In addition, the structure of the graph 
is reflective of the dependencies between these functionalities. 
Formally, we define an FD graph G = (V, E) as an acyclic 
directed graph where: 

> V is the vertex set that represents system functionality 
at various levels of abstraction in accordance to the 
following rules: 

- Mission: The root represents the highest level mis- 
sion or need of the system. There is exactly one 
overall system mission and hence, only one root node 
in an FD graph. In Figure [2] m is the root node as 
indegree(m) = (indegree is the number of edges 
coming into a vertex in a directed graph). 



- Directive: The leaf node represents a directive of the 
system. A system has a finite number of directives 
and hence, its FD graph also has the same number of 
leaves. In Figure [2] nodes di, i — 1 ... 14 represent 
directives as outdegree(di) = (outdegree is the 
number of edges going out of a vertex in a directed 
graph). 

- Functional Abstraction: An internal node repre- 
sents a functionality of the system. The level of ab- 
straction of the functionality is inversely proportional 
to the length of the directed path from the root to 
the internal node representing the concerned func- 
tionality. In Figure |2j nodes m , i = 1 . . . 9 represent 
functional abstractions as outdegree(rii) ^ and 
indegree(rii) =/= 0. 

• E = {(u, v)\u, v £ V, u 7^ v} is the edge set, where each 
edge indicates decomposition, intersection or refinement 
relationship between nodes. The edge construction rules 
are described below: 

- Decomposition: The partitioning of a functionality 
into its constituent components is depicted by the 
construction of a decomposition edge. The direct 
edge between a parent and its child node represents 
functional decomposition and implies that the func- 
tionality of the child is a proper subset of the parent's 
functionality. For example in Figure [2] the edges 
(to, ni), (m, 712), (m, 773), (m, 774) indicate that the 
functionality of m is decomposed into smaller func- 
tionalities ni, 77,2,713,77,4 such that to = ?ii U 712 U 

fn fn 

77,3 U 774 where U is the union operation performed 

fn fn 

on functionality. Hence, only non-leaf nodes i.e. 
internal nodes with an outdegree of at least two can 
have valid decomposition edges with their children. 

- Refinement: The refinement relationship is used 
when there is a need to express a node's function- 
ality with more clarity, say, by furnishing additional 
details. If outdegree(u) = l,u G V and (u,v) G E 
then the edge (u, v) represents a refinement relation- 
ship, v is a refined version of its parent u. In Figure 
|2l nodes 774 and rtg share a refinement relationship. 

- Intersection: To indicate the commonalities between 
functions defined at the same level of abstraction the 
intersection edge is used. Hence, a child node with an 
indegree greater than one represents a functionality 
common to all its parent nodes. For example, in 
Figure [2] tiq is a child node of parent nodes 771 and 
ri2- Consequently, n§ = n\ n 772 where n is 

func func 

the intersection operation performed on functionality. 
The edges (rii, tie), (713, rie) represent the intersec- 
tion relationship. 

Figure [2] illustrates an example FD graph. Note that the 
directives are the leaf nodes. Initial Capability sets are for- 
mulated from the internal nodes 77^,7 = 1,...,9, as they 
represent functional abstractions, on the basis of their coupling 
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Fig. 2. Example FD Graph G = (V, E) 

and cohesion values. The next section defines these measures 
and describes their role in formulating Capabilities, the other 
activity of Phase I. 

B. Phase I: Formulation 

The objective of the formulation activity shown in Figure Q] 
is to establish initial sets of Capabilities from G. An initial set 
is a meaningful combination of internal nodes and is termed as 
a slice. There can be many slices from a single FD graph. We 
use cohesion (Ch) and coupling (Cp) measures to compute 
the overall cohesion-coupling value, f(Ch, Cp), of each slice. 
This value is used to determine the initial sets of Capabilities 
from all possible slices of G. In this section we first explain 
why we choose to measure cohesion and coupling of nodes. 
Then, we elaborate on each individual measure and its metric. 
Finally, we discuss the construction of a slice and outline the 
process of selecting initial Capabilities based on f(Ch,Cp). 

1) Why cohesion & coupling: Capabilities are formulated 
so as to exhibit high cohesion and low coupling. Techniques 
of modularization suggest that these characteristics are typical 
of stable units [25] [26]. Stability implies resistance to change; 
in the context of CE, we interpret stability as a property that 
accommodates change with minimum ripple effect. Ripple 
effect is the phenomenon of propagation of change from the 
affected source to its dependent constituents [27]. Specifically, 
dependency links between modules behave as change prop- 
agation paths. The higher the number of links, the greater 
is the likelihood of ripple effect. Because coupling is a 
measure of interdependence between modules [28] we choose 
coupling as one indicator of stability of a module. In contrast, 
cohesion, the other characteristic of a stable structure, depicts 
the "togetherness" of elements within a module. Every element 
of a highly cohesive unit is directed toward achieving a single 
objective. We focus on maximizing functional cohesion, which 
indicates the highest level of cohesion [29] among all the other 
levels (coincidental, logical, temporal, procedural, communi- 
cational, and sequential) [26] and therefore, is most desirable. 
In particular, a Capability has high functional cohesion if 
all its constituent elements, viz. directives (later mapped to 
requirements), are devoted to realizing the function represented 
by the Capability. As a general observation as the cohesion 
of a unit increases the coupling between the units decreases. 



However, this correlation is not exact [26]. Therefore, we 
develop specific metrics to measure the coupling and cohesion 
values of internal nodes in G, and thereby, formulate initial 
sets of Capabilities. 

2) Cohesion Measure: A unit has functional cohesion if 
it focuses on executing exactly one basic function. Yourdon 
and Constantine [26] state that every element in a module 
exhibiting functional cohesion "is an integral part of, and 
is essential to, the performance of a single function". By 
the virtue of construction, in the FD graph the function of 
each child node is essential to achieving the function of its 
immediate parent node. Note that, neither the root nor the 
leaves of an FD graph can be considered as a Capability. This 
is because the root indicates the mission of the system, which 
is too holistic, and the leaves symbolize directives, which are 
too reductionistic in nature. Both of these entities lie on either 
extreme of the abstraction scale, and thereby, conflict with the 
objective of avoiding such polarity when developing complex 
emergent systems [8]. Thus, only the internal nodes of an 
FD graph are considered as potential Capabilities. In addition, 
these internal nodes depict functionalities at different levels of 
abstraction, and thereby, provide a representative sample for 
formulating Capabilities. We develop the cohesion measure for 
internal nodes by first considering nodes whose children are 
only leaves. We then generalize this measure for any internal 
node in the graph. 

a) Measure for internal nodes with only leaves as chil- 
dren: Internal nodes with only leaves as children represent po- 
tential Capabilities that are linked directly to a set of directives. 
In Figure [2] these are nodes rc.5, n.6, rc.7, n.g, rig. Directives are 
necessary to convey and develop an in depth understanding of 
the system functionality and yet, by themselves, lack sufficient 
detail to dictate system development. Failure to implement a 
directive can affect the functionality of the associated Capa- 
bility with varying degrees of impact. We hypothesize that 
the degree of impact is directly proportional to the relevance 
of the directive to the functionality. Consequently, the greater 
the impact, the more essential the directive. This signifies the 
strength of relevance of a directive and is symptomatic of the 
associated Capability's cohesion. Hence, the relevance of a 
directive to the functionality of a unit is an indicator of the 
unit's cohesion. 

The failure to implement a directive can be interpreted 
as a risk. Therefore, we use existing risk impact categories: 
Catastrophic, Critical, Marginal and Negligible [30] to guide 
the assignment of relevance values. Each impact category is 
well-defined and has an associated description. This is used to 
estimate the relevance of a directive on the basis of its potential 
impact. For example, in Table U negligible impact is described 
to be only an inconvenience, whereas a catastrophic impact 
implies complete failure. This signifies that the relevance of a 
directive with negligible impact is much lower when compared 
to a directive with catastrophic impact. Intuitively, the impact 
categories are ordinal in nature. However, we conjecture that 
the associated relevance values are more than merely ordinal. 
The issue of determining the natural measurement scales [31] 



of cohesion and other software metrics is an open problem 
[32]. Therefore, we refrain from subscribing both, the attribute 
in question i.e. cohesion and its metric i.e. function of rele- 
vance values, to a particular measurement scale. Rather than 
limiting ourselves to permitted analysis methods as defined 
by Stevens [31] we let the objective of our measurement — 
computing the cohesion of a node to reflect the relevance of 
its directives — determine the appropriate statistic to be used 
[33]. 

We assign values to indicate the relevance of a directive 
based on the perceived significance of each impact category; 
these values are normalized to the [0,1] scale. The categories 
and their associated relevance values are listed in Table U We 
estimate the cohesion of an internal node as the average of the 
relevance values of all its directives. The arithmetic mean is 
used to compute this average as it can be influenced by extreme 
values. This thereby captures the importance of directives 
with catastrophic impact or the triviality of directives with 
negligible impact, and affects the resulting average appropri- 
ately, to reflect the same. Every parent-leaf edge is associated 



Impact 


Description 


Relevance 


Catastrophic 


Task failure 


1.00 


Critical 


Task success questionable 


0.70 


Marginal 


Reduction in technical performance 


0.30 


Negligible 


Inconvenience/ nonoperational impact 


0.10 



TABLE I 

Relevance Values 



with a relevance value Rel(v,n) indicating the contribution 
of directive v to the cohesion of parent node n. For example 
in Figure |2] Rel(di,ri5) = 0.3. Note that, we measure the 
relevance of a directive only to its immediate functionality. For 
an FD graph G = (V, E) we denote relevance of a directive d 
to its parent node n as Rel(d, n) where d, v € V, (n, d) G E, 
outdegree(d) = and outdegree(n) > 0. Formally, the 
cohesion measure of a potential Capability that is directly 
associated with a set of directives i.e. the cohesion measure of 
an internal node neF with t leaves as its children (t > 0), is 
given by computing the arithmetic mean of relevance values: 

t 



For example in Figure [2] Ch(ni) = 0.525. The cohesion 
value ranges between and 1. A Capability with a maximum 
cohesion of 1 indicates that every constituent directive is of 
the highest relevance. 

b) Measure for internal nodes with only non-leaf chil- 
dren: Cohesion measure for internal nodes with only non-leaf 
children is computed differently. This is because the relevance 
value of a directive is valid only for its immediate parent and 
not for its ancestors. For example, the functionality of node 
n\ in Figure [2] is decomposed into nodes n$ and tiq. This 
implies that the functionality of n\ is directly dependent on 
the attainment of the functionality of both 715 and n§. Note 
that ri\ has only an indirect relationship to the directives of 



the system. In addition, the degree of influence that 715 and rig 
each have on parent m is influenced by their size (number of 
constituent directives). Therefore, the cohesion of nodes that 
are parents with non-leaf children is a weighted average of the 
cohesion of their children. Here, the weight is the size of a 
child node in terms of its constituent directives. This indicates 
the child's contribution towards the parent's overall cohesion. 
The rationale behind this is explained by the definition of 
cohesion, which states that a node is highly cohesive if every 
constituent element is focused on the same objective, i.e. the 
node's functionality. 

Formally, the cohesion measure of an internal node n with 
t > 1 non-leaf children is: 

t 

{size{vi).Ch{vi)) 
Ch{n) = l -^— t 

i=l 

such that (n, Vi) G E and, 

I ^2size(vi) (n,Vi) £ E;outdegree(vi) > 0; 
size(n) = ^ i=1 

[ 1 outdegree(n) = 

In the case where outdegree(n) — 1, i.e. the node has 
only one child v (say), then the Ch(n) is the Ch(v); if 
outdegree(n) — 0, i.e. n is a leaf (directive), Ch(n) is not 
applicable. 

3) Coupling Measure: As with cohesion, the concept of 
coupling was introduced by Stevens et al. [28] as the "measure 
of the strength of association established by a connection from 
one module to another". Coupling is also characterized as the 
degree of interdependence between modules. The objective 
of CE is to identify minimally coupled nodes as initial 
Capabilities. A Capability is related to another Capability only 
through its constituent directives, i.e. the coupling between 
Capabilities is the measure of the dependencies between 
their respective directives. Thus, we first discuss the coupling 
between directives, and then develop the coupling measure for 
Capabilities. 

a) Coupling between directives: Generally, metrics that 
measure coupling between modules utilize data from the 
source code or the system design. However, to measure the 
coupling between directives we have neither design infor- 
mation nor implementation details at our disposal. We only 
have the structural information provided by the FD graph. In 
particular, we consider an undirected version G' (shown in 
Figure of the FD graph G where G' = (V,E') and E' 
is the set of undirected edges. We denote coupling between 
directives (leaf nodes) d x and d y as Cp(d x , d y ). Note that 
Cp(d x , d y ) Cp(d y ,d x ) as Cp(d x ,d y ) is the dependency of 
d x on d y , which can be quantified by measuring the effect 
on d x when d y changes. Similarly, Cp(d y ,d x ) indicates the 
dependency of d y on d x . In general, we hypothesize coupling 
as a function of two components: distance and probability of 
change. 



> Distance: We know that directives associated with the 
same Capability are highly functionally related. In G", 
this is represented by leaves that share the same parent 
node. However, relatedness between directives decreases 
with increasing distance between them. We define dis- 
tance between directives u, v £ V as the number of edges 
in the shortest undirected path between them and denote 
it as dist(u, v). By choosing the shortest path we account 
for the worst case scenario of change propagation. Specif- 
ically, the shorter the distance, the greater the likelihood 
of impact due to change propagation. 
In Figure [3] d\ and di are directives of the same parent 
715 and so are highly related with dist(di,d2) = 2. 
In contrast, d\ and dg have a lower relatedness with 
dist(di,dg) — 6 as they are connected only through 
common ancestors. The shortest paths connecting d\ and 
d2 and, d\ and dg are highlighted in Figure [3] Thus, from 
the distance measure we conclude that di is less likely 
to be affected by a change in dg than a change in di- 
Consequently, Cp(di,dg) < Cp(di, di). Hence, for any 
two directives u and v we deduce: 




directives 



Fig. 3. Undirected FD Graph G' = (V, E') 

• Probability of Change: We are interested in choosing 
internal nodes that are minimally coupled as initial Capa- 
bilities. Minimal interconnections reduce the likelihood 
of a ripple effect phenomenon. We know that coupling 
between Capabilities is a function of coupling between 
their respective directives. As mentioned earlier, if u 
and v be directives then Cp(u, v) can be quantified by 
measuring the effect on u when v changes. However, we 
still need to compute the probability of occurrence of such 
a ripple effect. This implies computing the probability 
that a directive might change. Therefore, Cp(u, v) also 
needs to factor in the probability of directive v changing: 
P{v). Consequently, the coupling between two directives 
u and v is computed as: 

^ ' ^ dist(u,v) 



This metric signifies the coupling between directives u 

and v as the probability that a change in v propagates 

through the shortest path and affects u. 

b) Coupling between Capabilities: Capability p is cou- 
pled with Capability q if a change in q affects p. Note that 
Cp(p, q) is the measure that p is coupled with q and so, 
Cp(p,q) ^ Cp(q,p). In particular, a change in q implies a 
change in one or more of its constituent directives. Therefore, 
the coupling measure for Capabilities is determined by the 
coupling between their respective directives. However, it is 
possible that p and q share common directives. In such a 
case, we need to make a decision about the membership of 
these directives and ensure that they belong to exactly one 
Capability. This is reflective of the actual system, where any 
functionality is implemented only once and is not duplicated. 
Criteria such as the relevance value of the directive or its 
contribution to the overall cohesion may be used to resolve 
this issue. For now, we use the former criteria. 

In terms of G we define the set of leaves (directives) 
associated with an internal node n as: 

D n = {x\ 3path(n,x);outdegree(x) = 0;n, x £ V} 

where path(n, x) is a set of directed edges connecting n and 
x. For example the set of leaves associated with the internal 
node 77,3 £ V is _D„ 3 = {dio, dn, du, di3, du}. Now consider 
Cp(ri5,ne), from Figure [3] which is the coupling between 
internal nodes 115 and n§. As Cp(n,5,nQ) quantifies the effect 
on 77,5 when uq changes i.e. we need to compute the effect on 
the directives associated with 775: d\, d2,d% when the directives 
associated with n^. (^4,^5 change. We compute coupling by 
associating the common directive d% with 77,5 and not 
because Rel(dz,n^) > Rel(d3,n§). We use the relevance 
value to decide the membership of a directive. Therefore, 
L>„ 5 = {dx,d 2 ,d 3 } and D n(i = {d±,d b }. The coupling 
between 77,5 and 77,6 is given by: 



Cp(n 5 , 7i 6 ) 



E 



E Cpidudj) 



\Dn 5 \-\D n6 \ 

where \D ns \ is the cardinality of D n& . 

Generalizing, the coupling measure between any two inter- 
nal nodes p,q £ V, where outdegree(p) > \,outdegree(q) > 
1 and D p n D q — <f> is: 



Cp(p, (?) 



where Cp(di,dj) = 



E E 

di€D p djED q 



Cp(d 2 ,dj) 
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and P(dj) = 



1 



D„ 



dist(di, dj) 

P(dj) is the probability that directive dj changes among all 
other directives associated with the node q. 

4) Initial Capabilities Sets: The cohesion and coupling 
measures are used to formulate initial sets of Capabilities 



from the FD graph. However, prior to the application of these 
measures, we determine what combinations of internal nodes 
are meaningful enough to be considered as Capabilities. For 
example, in FD graph G of Figure [2] the set {m, 77,5, 77,6 } is an 
unsound combination of Capabilities as they are a redundant 
portrayal of only a part of the system functionality. Recall 
that Capabilities are functional abstractions that form the 
foundation of a complex emergent system, and thereby, need 
to be formulated with sound principles and rules. 

We first identify valid combinations of internal nodes termed 
slices from an FD graph. Then, we apply the measures of 
coupling and cohesion on these slices to determine the initial 
sets of Capabilities. Note that each node of a slice is a potential 
Capability. For an FD graph G = (V, E) we define slice S as 
a subset of V where the following constraints are satisfied: 

1) Complete Coverage of Directives: We know that a Ca- 
pability is associated with a set of directives, which are 
finally mapped to system requirement specifications (see 
Figure Q]). Consequently, a set of initial Capabilities of 
the system has to encompass all the directives resolved 
from user needs. The leaves of the FD graph constitute 
the set of all directives in a system. We ensure that 
each directive is accounted for by some Capability, by 
enforcing the constraint of complete coverage given by 

TCI 

|J A = {L}, where 

i=l 

• Di denotes the set of leaves associated with the i th 
node of slice S 

• L={ti6 V\outdegree(u) — 0} denotes the set of 
all leaves of G 

• 777, = \S\ 

2) Unique Membership for Directives: In the context of 
directives, by ensuring that each directive is uniquely 
associated with exactly one Capability we avoid imple- 
menting redundant functionality. Otherwise, the purpose 
of using slices to determine Capabilities as unique 
functional abstractions is defeated. We ensure the unique 

m 

membership of directives by the constraint f] Di = 

i=l 

3) System Mission is not a Capability: The root is the high 
level mission of the system and cannot be considered 
as a Capability. The cardinality of a slice containing the 
root can only be one. This is because including other 
nodes with the root in the same slice violates the second 
constraint. Hence, {Vu £ S, indegree(u) 7^ 0}. 

4) Directive is not a Capability: A leaf represents a di- 
rective, which is a system characteristic. A slice that 
includes a leaf fails to define the system in terms of its 
functionality and focuses on describing low level details. 
Hence, {Vu £ S,outdegree(u) ^ 0}. 

For example S = {n>i, 77,7, 773} is a valid slice for the graph 
illustrated in Figure [2] Note that criteria such as the relevance 
value of a directive or its contribution to the associated 
node's cohesion value are used to decide the membership of a 
directive so that it is unique, satisfying the second constraint. 



Let there be p > 1 number of slices computed from graph 
G. We use the previously defined measures to rank the slices 
based on their values of coupling and cohesion. Based on this 
ranking we determine initial sets of Capabilities. 

Let Chi and Cpi denote the cohesion and coupling values 
of the i th node of slice Sj respectively, where Sj = {nj, 1 
i < q,q = \Sj\}, 1 ^ j ^ p. We compute fj(Ch,Cp), a 
function of the cohesion and coupling values of all nodes in 
Sj to represent the overall cohesion-coupling value of the slice. 
We rank the p slices based on their cohesion-coupling value 
f(Ch, Cp) and choose those slices with an above average 
value as initial sets of Capabilities. 

The initial sets of Capabilities, described above, form the 
output of Phase I. The next phase of the CE process as shown 
in Figure [T] is Phase II. In Phase II, we apply an MDO 
approach on the initial sets to determine the most optimal set 
of Capabilities. The optimized Capabilities and their associated 
directives are then mapped to system requirements. Thus, 
optimization and mapping are the two major activities of Phase 
II. 

C. Phase II: Optimization 

The initial sets of Capabilities are in essence, combinations 
of internal nodes selected from the FD graph. These Capabil- 
ities and their directives possess valuable domain knowledge 
and represent user needs. However, they are too coarse and un- 
refined to dictate system development in the solution domain. 
Hence, we need to optimize these Capabilities with respect 
to constraints germane to complex emergent system devel- 
opment. In particular, we focus on three specific constraints: 
overall cohesion-coupling value f(Ch, Cp), technology fea- 
sibility tf and schedule sched(order,time). The initial set 
whose values of f(Ch,Cp), tf and sched(order,time) are 
optimal when compared to all other sets of Capabilities is 
selected using the MDO approach. Conceptually, we desire to 
maximize the objective function z subject to the previously 
defined constraints. This is described as: 

Objective Function: 

Maximize z{f{Ch, Cp), tf, sched(order, time)) 
Constraints: 

sched schedMAx 
f(Ch,Cp) ^ f M iN(Ch,C P ) 

Note that values tfjwiN, schedMAX, fMiN{Ch,Cp) can be 
defined by the user. Figure |4] illustrates this conceptually, 
depicting the feasible region. Since we have already discussed 
f(Ch, Cp), we now explain the other two constraints: tf 
relating to technology advancement and sched(order, time) 
derived from the implementation schedule. 

a) Technology Advancement: We examine two possible 
scenarios, caused by technology advancement, when incorpo- 
rating technology in a system: 

• Technology Obsolescence: Given the rapid rate of hard- 
ware advancements, a lengthy development period of a 




Fig. 4. Feasible Region 

complex emergent system can render the initial technol- 
ogy requirements invalid. Consequently, the technology 
of an existing Capability becomes obsolete. 
• Technology Infusion: The functionality expected of a 
system may undergo substantial modification over a long 
period of time requiring the introduction of new Capabil- 
ities. This results in the infusion of new technology into 
the existing system. 
Intuitively, we know that by minimizing the coupling between 
Capabilities, the impact of change relative to technology 
advancement is reduced. In addition, we hypothesize that 
cohesion also plays a vital role. In Capabilities with high 
cohesion every element is highly focused on a single function. 
Consequently, elements of a functionally cohesive Capability 
are strongly tied to the underlying technology, as this technol- 
ogy assists in implementing the functionality. Hence, replacing 
technology of an existing Capability is easier when it is highly 
cohesive. We use the term tf, i.e. technology feasibility, to 
indicate the feasibility of currently available technology to 
implement an initial set of Capabilities. More specifically, tf l s 
is the feasibility of the available technology to satisfactorily 
develop slice S at the time instant i. 

b) Schedule: Similar to technology feasibility we also 
consider the implementation schedule as being a constraint 
on selecting slices. We theorize that schedule is a function of 
order and time; sched(order, time). Order is the sequence 
in which the Capabilities of a slice need to be developed. 
In particular, it is likely that certain functionalities have a 
higher priority of development than others. Hence, the order 
of developing functionalities is crucial in the selection of 
Capabilities. Furthermore, some functionalities may have to 
be implemented within a specific time period. Thus, time 
is also a factor in determining the schedule. In conclusion, 
when selecting slices we focus on the constraints of coupling- 
cohesion, technology feasibility and schedule, to combat the 
factors of change viz. volatility and technology advancement. 



D. Phase II: Mapping to Requirements 

The final activity of the CE process, as shown in Figure 
Q] is the mapping of directives to system requirements. We 
claim that there is a one-many mapping from a directive to 
a requirement. Both entities are defined at a reductionistic 
level of abstraction and share the objective of signifying the 
characteristics of a system. Therefore, we hypothesize that the 
process of mapping is uncomplicated. 

IV. Validation 

In this section we describe our ongoing research activities 
to validate the efficacy of the CE approach for constructing 
change-tolerant systems. In general the best approach to as- 
serting the validity of CE is to employ a longitudinal study 
spanning the development of a complex emergent system. 
However, such an approach warrants a lengthy time-period. 
Alternatively, we choose to validate our theory on an existing 
system that exhibits the characteristics of a complex emergent 
system and possesses a change history archive. The following 
sections describe this system, examine its appropriateness 
for validation purposes, outline the validation procedures and 
discuss some preliminary observations. 

A. System Characteristics 

The system being used for validation purposes is Sakai 
Collaboration and Learning Environment, an open-source, 
enterprise-scale software application. It is being developed by 
an international alliance of several universities spanning four 
continents [34]. The current Sakai system consists of about 
80,000 lines of code and its complexity is futher compounded 
by distributed development. The high-level mission of this 
system is to realize specific research, learning, collaboration 
and academic needs of universities and colleges. The system is 
constantly evolving to accommodate the needs of its 300,000+ 
users. System increments that incorporate new functionalities 
are released on a yearly basis. Also, the overall system is 
envisioned to be used for an extended time-period. Hence, 
the Sakai system exhibits characteristics of complex emergent 
systems and appears suitable for the purpose of validating our 
CE approach. 

B. Outline of the Validation Approach 

We have constructed the FD graph, G s k, for the Sakai 
system based on extensive documentation of user needs, long- 
term feature requests and results of community polls. The 
graph has also been validated by the members of the Sakai 
project to ensure that it is a true decomposition of user 
needs. We have computed 85 valid slices from a possible 
1152921504606846976 combinations of nodes. Our hypoth- 
esis is that an optimal slice appropriately identified by CE, 
say S ce , is more change-tolerant than the slice implemented 
in the actual Sakai system, S s k- Both S ce and S s k are among 
the set of valid slices previously determined from G s k- To 
test the hypothesis we examine the ripple-effects of change in 
both the slices. The comprehensive change history maintained 
in Sakai archives facilitate the ripple-effect analysis in S s k- 



We inspect the code to trace the effect of similar changes 
in S ce . Several different scenarios of change — modified 
requirements, deleted needs, addition of new features — are to 
be analyzed. Presently, we quantify the impact of change as the 
number of affected entities. These entities can be requirements, 
implementation modules, system functionalities and so on. 

C. Preliminary Empirical Observations 

A preliminary analysis of G s k has resulted in several 
informative observations related to the construction of change- 
tolerant systems. We outline four of them below: 

1 ) Common functionalities: The graph structure indicates 
that there is a relationship between the number of intersection 
edges of a node and its coupling measure with other nodes. 
Recall that an intersection edge indicates common functionali- 
ties. In particular, the higher the number of intersection edges 
from an internal node, the greater is its coupling value. We 
observe that the addition of an intersection edge might provide 
a shorter path of traversal between directives, and thereby, 
result in increased coupling. To what extent, if any, can this 
observation be used in guiding the design of complex emergent 
systems? 

2) Factoring in Level of Abstraction: In certain cases, we 
observe that an optimal slice of the FD graph consists of nodes 
defined at the highest level of abstraction. This is because our 
cohesion and coupling measures are averages derived from 
bottom-up computations, and thereby, tend to identify nodes 
closest to the root as being optimal. In general, there is a 
relation between the abstraction level and the size (number of 
associated directives) of a node. This is exemplified by an FD 
graph that is a complete tree, where nodes at a lower level of 
abstraction are of a smaller size. From a software engineering 
perspective, it is prudent that the Capabilities of a system are 
not only highly cohesive and minimally coupled, but are also 
of reduced size. Hence, nodes of a lower abstraction that are 
marginally more coupled but are of a smaller size, are better 
suited for system development. Therefore, the abstraction 
level, ascertained from a top-down decomposition of the graph, 
should be utilized in conjunction with the bottom-up measures 
of cohesion and coupling to determine optimal slices. 

3) Coupling-Cohesion trend: The computation of cohesion 
and coupling metrics is independent of each other. However, 
we observe that on an average, in a slice, nodes viz. Capabil- 
ities that have high cohesion values also exhibit low coupling 
with other nodes. This is certainly in line with with desirable 
software engineering characteristics. 

4) Schedule: Coupling between two nodes is determined in 
part by the sizes of each node. Therefore, the coupling between 
two nodes can be asymmetric. For example, in Figure [3] ri\ 
and rig are of different sizes and so, Cp(ni, rig) ^ Cp(ng, rii). 
Consequently, the coupling measure can assist in choosing 
an implementation order of Capabilities that potentially min- 
imizes the impact of change. Note that permuting a slice of 
nodes produces different sequences of implementation, each 
of whose coupling value can be computed. This observation 



implies that the coupling measure can help define the crite- 
ria for determining an implementation schedule, which is a 
function of order and time as discussed in Section UlI-CI 

V. Conclusion 

Complex emergent systems need to be change-tolerant, 
as they have lengthy development cycles. Requirements and 
technology often evolve during such development periods, 
and thereby, inhibit a comprehensive up-front solution spec- 
ification. Failing to accommodate changed requirements or 
to incorporate latest technology results in an unsatisfactory 
system, and thereby, invalidates the huge investments of time 
and money. Recent history of system failures provides ample 
evidence to support this fact. We propose an alternative 
approach of development termed CE to develop change- 
tolerant systems. It is a scientific, disciplined and deliberate 
process for defining Capabilities as functional abstractions, 
which are the building blocks of the system. Capabilities are 
designed to exhibit high cohesion and low coupling, which 
are also desirable from a software engineering perspective, 
to promote change-tolerance. Also, the CE process touts an 
MDO approach for selecting an optimal set of Capabilities that 
accommodates the constraints of technology advancement and 
development schedule. CE is a recursive process of selection, 
optimization, reorganization and hence, the stabilization of 
Capabilities. We envision that the Capabilities based approach 
provides a high level development framework, for complex 
emergent systems, accommodating change and facilitating 
evolution with minimal impact. 
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